Discriminators of pseudoprogression and true progression in high-grade gliomas: A systematic review and meta-analysis

High-grade gliomas remain the most common primary brain tumour with limited treatments options and early recurrence rates following adjuvant treatments. However, differentiating true tumour progression (TTP) from treatment-related effects or pseudoprogression (PsP), may critically influence subsequent management options. Structural MRI is routinely employed to evaluate treatment responses, but misdiagnosis of TTP or PsP may lead to continuation of ineffective or premature cessation of effective treatments, respectively. A systematic review and meta-analysis were conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses method. Embase, MEDLINE, Web of Science and Google Scholar were searched for methods applied to differentiate PsP and TTP, and studies were selected using pre-specified eligibility criteria. The sensitivity and specificity of included studies were summarised. Three of the identified methods were compared in a separate subgroup meta-analysis. Thirty studies assessing seven distinct neuroimaging methods in 1372 patients were included in the systematic review. The highest performing methods in the subgroup analysis were DWI (AUC = 0.93 [0.91–0.95]) and DSC-MRI (AUC = 0.93 [0.90–0.95]), compared to DCE-MRI (AUC = 0.90 [0.87–0.93]). 18F-fluoroethyltyrosine PET (18F-FET PET) and amide proton transfer-weighted MRI (APTw-MRI) also showed high diagnostic accuracy, but results were based on few low-powered studies. Both DWI and DSC-MRI performed with high sensitivity and specificity for differentiating PsP from TTP. Considering the technical parameters and feasibility of each identified method, the authors suggested that, at present, DSC-MRI technique holds the most clinical potential.

Search strategy. The search strategy was devised in line with the recommendations in Bramer and colleagues 15 . The entirety of Embase, MEDLINE, and Web of Science were searched on the 20th of May 2022. The first 200 results of a Google Scholar search were also included. The full search strategy is detailed in Supplementary Material A. Studies returned by the search were compiled and screened for data extraction in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) method 16 . All titles and selected abstracts were screened on the basis of the inclusion criteria, and full texts were subsequently reviewed. The full selection process is shown in Fig. 1, and the full exclusion criteria listed in Supplementary Material C. The protocol was registered to PROSPERO prior to searching (ID CRD42022218217).
Data extraction and quality assessment. Data were extracted onto a spreadsheet with the following variables: (1) first author and year of publication; (2) neuroimaging method of discrimination between TTP and PsP; (3) sample size and demographics, including proportion of patients that presented with PsP; (4) administration of radiotherapy / chemoradiation therapy / CCRT; (5) tumour grade; and (6) sensitivity and specificity measures. Extracted data were inputted into Cochrane Review Manager 5.4 17 . The review was conducted according to the Meta-analysis of Observational Studies in Epidemiology (MOOSE) proposal 18 . The quality of included studies and risk of bias was assessed with the Quality Assessment of Diagnostic Accuracy (QUADAS-2) tool by two independent and blinded reviewers C.T. and V.S 19 . Any disagreement was resolved with consensus.
In the included studies, sensitivity was defined as the proportion of patients with histopathologically confirmed TTP that presented as such with the modality of choice, or the true positive rate. Specificity was defined as the proportion of patients with histopathologically confirmed PsP that presented as such, or the true negative rate. A high sensitivity and specificity constitute a high diagnostic accuracy, which in turn represents the overall precision of clinical decisions. Studies that did not meet this definition of sensitivity and specificity were excluded during data extraction.
Data synthesis and statistical analysis. All included studies were presented in a forest plot. A separate subgroup meta-analysis was performed to compare the three most prevalent methods among studies: DSC-MRI (n = 12), DCE-MRI (n = 4), and DWI (n = 12). For the systematic review, the primary outcomes were TTP and PsP discrimination method sensitivity and specificity. For the subgroup meta-analysis, the primary outcomes were pooled sensitivity, specificity, and area under the summary receiver operating characteristics curve (SROC AUC) 20  www.nature.com/scientificreports/ Within-group heterogeneity was assessed using the I 2 variable, which describes the proportion of variation in study results that can be attributed to heterogeneity 21 . To account for the high heterogeneity of data across multiple modalities, data inputted into Cochrane Review Manager 5.4 were analysed using a random effects model, which assumes individual effects are uncorrelated with independent variables. Pooled sensitivity, specificity, likelihood ratio, negative likelihood ratio, diagnostic odds ratio (DOR) and SROC AUC were calculated using the MIDAS (meta-analytical integration of diagnostic test accuracy studies) package in STATA 22 . Forest plots and a subgroup analysis SROC plot were generated in Cochrane Review Manager 5.4. For studies reporting median instead of mean age, the mean was estimated according to previously established methods 23 . Consent to participate. Informed consent was obtained from all individual participants included in the study.

Results
Search results. Following deduplication, the literature search yielded 911 abstracts. Following abstract screening, 70 full-text articles were retrieved and assessed for eligibility. Data was extracted from 39 articles, 9 of which did not report sufficient information to allow for calculation of sensitivity and specificity values, and therefore were excluded. A total of 30 studies totalling 1372 patients were included in the systematic review (see Table 1).   26,28,29 .
All methods were summarised in the non-subgroup analysis (Fig. 2). Three studies reported 100% sensitivity and specificity: two using DSC-MRI, and one using DWI. FET-PET reported high overall sensitivity and specificity in the included study 30 . The included paper that applied APTw-MRI also reported a high diagnostic accuracy (sensitivity = 95%, specificity = 0.92%) 31 . These modalities had insufficient data to be included in the subgroup analysis.
The lowest sensitivity (38%) for identifying tumour progression was reported by Young and colleagues 32 , who examined visual signs such as enhancement on conventional MRI across 93 patients. The only other paper that applied conventional MRI acquisitions used a radiomics-based approach, but still reported relatively low diagnostic accuracy 33 . The lowest specificity for true progression (23%) was reported by Kerkhof and colleagues 34 , which differentiated PsP and TTP by using visual interpretation of relative cerebral blood volume (rCBV) maps from DSC-MRI. Sensitivity tended to be higher than specificity in the majority of included studies. Subgroup analysis. Three distinct methods across 25 studies reported sufficient data to include in a separate set of subgroup analyses: DSC-MRI (n = 12), DCE-MRI (n = 4), and DWI (n = 12). Subgroups included Table 1. Details of included studies. APTw-MRI = amide proton transfer-weighted MRI; ASL = arterial spin labelling; CRT-TMZ = chemoradiotherapy with adjuvant temozolomide; NS = not specified, TTP = true progression. Conventional MRI included contrast-enhanced T1-weighted and T2-weighted acquisitions.  ).
Heterogeneity for both DCE-MRI and DWI was calculated as I 2 = 0%, but high heterogeneity was reported in the DSC-MRI subgroup (I 2 = 79%). This heterogeneity was more predominant in the reported specificity (I 2 = 85%). The DSC-MRI subgroup had the greatest amount of variation in methodology. However, true heterogeneity is unlikely to be zero in the DCE-MRI and DWI subgroups, and the small sample size may have led to an underestimation 35 . Quality assessment. Thirteen of the included 30 studies were determined to have a high risk of bias.
Nearly all included studies had a high risk of bias in the index test section. The parameter cut-off values were not pre-specified and instead defined post-hoc. High risk of bias was also apparent in the patient selection category. This was largely due to inclusion of patients who received steroids with standard chemoradiotherapy in some studies. Details of patient enrolment and inclusion/exclusion were also unclear in some studies, and nearly 40% of total included patients presented with PsP, which is higher than previous estimates 8 . There were low applicability concerns observed in the included studies. The full risk of bias table and a more detailed summary of quality assessment across all studies is detailed in Supplementary Material B1 & B2.

Discussion
The current systematic review and meta-analysis aimed to compare the most promising methods of the differentiation of PsP and TTP in patients with high-grade gliomas. A prior meta-analysis has compared the utility of DWI and PWI (perfusion-weighted imaging) for discriminating PsP and TTP 36 . Consistent with our results, they found the two modalities to be very comparable in terms of diagnostic accuracy. In contrast to the study by Assessment of imaging results using pre-specified parameter cut-off values was associated with higher sensitivity and specificity values in comparison to studies that relied on visual inspection. Kerkhof and colleagues 34 visually inspected rCBV colour maps to differentiate PsP from TTP, which yielded 72% sensitivity and 23% specificity, both of which were the lowest of the twelve studies included in the DSC-MRI subgroup, which may have negatively skewed averaged results. In contrast, two other included DSC-MRI studies 31,37 reported 100% sensitivity and 100% specificity using parameter cut-off values to differentiate PsP from TTP. Jovanovic Direct comparisons between the diagnostic accuracies of DSC-MRI and DWI were provided in studies by Kim and colleagues 24 , Prager and colleagues 25 , and Shi and colleagues 26 . Kim and colleagues found that the maximum CBV parameter of DSC-MRI and the mean apparent diffusion coefficient (ADC) of DWI differentiated PsP and TTP with the same sensitivity (79%) and specificity (45%) in 34 patients 24 . In the other two studies, DSC-MRI outperformed DWI in specificity, but both reported similarly high sensitivity results 25,26 .
Choi and colleagues 28 investigated the diagnostic accuracy of DSC-MRI and ASL. The sensitivity and specificity of DSC-MRI were determined as 82.4% and 67.9%, respectively and 79.4% and 64.3%, respectively, for ASL. A combination of the two modalities resulted in an increased sensitivity and specificity of 94.1% and 82.1%, although this did not represent a significant increase in diagnostic accuracy (p = 0.133). Jovanovic and colleagues 27 separately assessed DSC-MRI and ASL, and quantitative analysis found both methods to yield 100% sensitivity in their patient sample. For specificity, ASL scored 73% compared to 100% for DSC-MRI. All four included diffusion/perfusion-based methods show clinical potential. DSC-MRI is currently the most widely employed, and its protocol and acquisition parameters are already well-defined 38 .
FET PET. There has been increasing interest in the application of PET in differentiating between PsP and TTP.
One study included in this review used 18F-FET PET and found the maximal tumour-to-brain ratio (TBR max ) differentiates between the two with 100% specificity and 91% sensitivity at a cut-off of 2.3, in a sample of 22 patients 30 . A similar cut-off value of TBR max = 2.55 was reported by Kebir and colleagues 39 . In the same study, a linear discriminant analysis-based algorithm was trained on IDH-wildtype glioblastoma FET PET features and compared the results to a conventional FET PET analysis. The algorithm provided an AUC of 0.93, which was higher than the AUC for TBR max of 0.68.
Several other FET PET studies in the literature were found during the search but did not meet the inclusion criteria, since patients were scanned more than 6 months following diagnosis. This may be a contributing factor  APTw-MRI. APTw-MRI was used to differentiate PsP and TTP in just one study. Ma and colleagues 31 found APTw-MRI to correctly identify 19 out of 20 patients in their TTP cohort (95%) and 11 out of 12 patients in their PsP cohort (92%). There was a marked signal increase in the TTP compared to PsP cohort, with an APTWmean cut-off of 2.42% and an APTWmax cut-off of 2.54%. This may be a promising method in the future, but further work on a larger dataset is required.

Combination methods. Multimodal approaches often demonstrate increased diagnostic accuracy and
provide an additional layer of confidence compared to individual modalities. It is reasonable to assume the highest diagnostic accuracy would be achieved from the combination of results from already established modalities. However, the trade-off is the accompanied increase in cost and acquisition time. Regardless, with increasing availability of several above-mentioned modalities, the advantage of combination methods could be considered on a case-by-case basis. Three combination methods were included in the present review. A combination of K trans and rCBV maps obtained from DCE-MRI and DSC-MRI acquisitions, respectively, reported high sensitivity (88%) and specificity (91%) in a cohort of 98 patients 29 . The maps could not discriminate between PsP and TTP in the cohort when used individually. Choi and colleagues 28 combined DSC-MRI with ASL and reported sensitivity and specificity of 94% and 82%, respectively, also finding the combination values higher than the individual methods. Lastly, Shi and colleagues 26 found that using DSC-MRI and DWI separately produced a specificity of 0.83 and 0.58, respectively. When used in combination, this increased to 92% overall. However, the combination also led to a decrease to 86% in sensitivity overall, despite DWI alone accurately identifying all 22 cases of tumour progression.
Clinical utility. Despite the large number of studies reporting the diagnostic potential of different imaging protocols, their routine clinical use has not been implemented. A summary of the main clinically relevant parameters is presented in Table 3 [41][42][43] .
An inherent limitation of using perfusion-weighted imaging is that while perfusion parameters are generally lower in PsP, the associated inflammatory response is likely to influence perfusion and lead to increased perfusion parameters such as rCBV 44 . Similar effects have been seen with DWI as a result of radiation necrosis, suggesting decreased ADC may not always reflect a high cellularity and TTP 45 . However, both PWI and DWI appear to demonstrate high overall diagnostic accuracy.
As the most commonly used perfusion MRI modality, DSC-MRI may be preferable for standard protocol due to its high clinical availability and short acquisition time that can be under one minute 43 . The standardisation of rCBV discriminating cut-off values is limited by numerous potential imaging and data processing artifacts impeding accurate perfusion quantification as outlined by Willats and Calamante's 39 steps for accurate perfusion of DSC-MRI data 46 . One of the most widely discussed issues is the possibility of contrast agent leakage into extracellular tissue, known as T1 shine-through effect 47 . Application of model-based leakage corrections is advised for single-echo gadolinium-based DSC-MRI to account for the extent of vascular permeability 48 .
DCE-MRI has a high signal-to-noise ratio compared to the other MR-perfusion techniques 49 . The main limitation of this method is the relatively long data acquisition time, often over several minutes 50 . Similar to other perfusion techniques, full quantification remains challenging due to difficulties with DCE tracer modelling. Efforts are currently undertaken to resolve issues related to accurate quantification of perfusion techniques. The establishment of taskforces such as the Quantitative Imaging Biomarkers alliance will facilitate clinical implementation of methods by providing reference measures and guidelines for best practices 51 .
ASL was a less frequently reported discriminating method compared to other perfusion methods. The main advantage of ASL over DSC-MRI is that it does not require a gadolinium-based bolus injection. It may therefore be more suitable for patients with contraindications to administration of contrast agents 52 . Furthermore, ASL can acquire entirely quantitative values of cerebral blood flow (CBF). A non-significant increase in sensitivity and specificity was observed when CBF measures acquired using ASL was combined with DSC-MRI, compared to use of the methods 28 . Jovanovic and colleagues 27 concluded that the diagnostic accuracy of ASL was sufficient to replace DSC-MRI and therefore, avoid repeat follow-up contrast injections. An important consideration of www.nature.com/scientificreports/ ASL is the longer acquisition time of 8-10 min at 1.5 T and 4-5 min at 3 T as well as significantly lower signalto-noise ratio (SNR) compared to other perfusion methods 43 . APTw is a novel imaging technique demonstrated to detect the increased mobile protein content in brain tumours 53 . Its full potential is yet to be established as U.S. Food and Drug Administration (FDA) approval of 3D-APTw for use on 3 T clinical MRI scanners was granted in 2018 54 . However, APTw examinations may be time consuming (~ 5-10 min) and are susceptible to magnetic field inhomogeneities 55 . Some work aims to optimise the signal-to-noise ratio and image acquisition speed 56 . APTw is a promising method with initial studies reporting a high diagnostic accuracy, but larger datasets are needed to compare its performance against other techniques.
Despite the high reported sensitivity and specificity of 18F-FET PET, a long acquisition time of 50 min as reported by Galldiks and colleagues 30 limits clinical potential. Since 18F-FET PET relies on administration of labelled amino acid analogue, patients in the study were also required to fast for at least 12 h before scanning. In contrast to other radiotracers, the half-life of fluorine-18 is long enough to allow for off-site production. The requirement for pharmacokinetic analysis with compartment modelling 57 further limits potential for clinical implementation.
Future directions. Quantitative methods offer a more objective approach towards finding patterns in clinical data and enable more accurate diagnosis compared to qualitative methods 58,59 . Jang and colleagues 60 recently applied a deep learning approach using convolutional neural networks to the differentiation of pseudoprogression and true progression and achieved a sensitivity of 87% and a specificity of 94.5%. Another study found a benefit of the combination of hypervascularity, cellularity and permeability parameters over single parameter measurements to distinguish the conditions 61 . The need for large datasets for training and testing radiomics models has led to a general lack of power, therefore future research should focus on increasing accessibility and data availability. National support for the scaling of technology and the potential use of artificial intelligence to aid clinical decision making has been outlined in the NHS Long Term Plan 62 .

Conclusion
Our systematic review and meta-analysis found DWI and DSC-MRI to have the highest diagnostic accuracy for differentiating between PsP and TTP. Considering the acquisition time and availability, DSC-MRI holds high potential for clinical implementation. The risk of repeat contrast agent injections required for DSC-MRI could be offset with the substitution of DSC-MRI for ASL. There was a clear advantage of using parameter cut-offs, over methods that relied on qualitative visual inspection. The diagnostic accuracy of methods such as PET, APTw-MRI, clinically feasible combination methods, and quantitative multiparametric techniques should be investigated in large-scale studies.